337 samples and 7577 SNPs from across the northern gulf and western Atlantic. 4356 SNPs after LD thinning


Population structure

PCA

First, just run a basic PCA with all samples, colored by region:


PCA colored by depth

PCA colored by distance from shore

We get clustering by shallow and deep/ nearshore and offshore, rather than Atlantic vs. Gulf. Suggesting that nearshore in the Atlantic and Gulf are more similar that geographic promixmate animals nearshore and offshore.


Checking for an inversion

The patterns above maybe look like an inversion. If we saw this, we’d see the PCA loadings clustered in a single location in the genome, but the PCA loadings don’t indicate this, instead we see the loadings distributed across the genome.




Admixture

Running multiple K’s with the cross-validation error for each. Lowest error indicates the most likely K. It is basically withholding a subset of the genotypes then predicting their values and comparees this to the withheld data.

Sorting the plot below by Atlantic on the left, Gulf on the right. Then shallow to deep by collection location

Note that the left group in the Atlantic are the nearshore individuals, the bottom left in the PCAs above.

ordered shallow to deep, following subset for unequal sample sizes:


Population structure with populations inferred from Admixture

NJ tree of pairwise genetic distances



Population assignment conclusion

  • Both Admixture and DAPC (not shown) indicate 4 populations as the most likely. They also agree with their individual assignments for all but ~5 individuals, who are all fairly admixed.
  • The four popoulations are:
    • Coastal Gulf
    • Coastal Atlantic
    • Intermediate
    • Offshore

For the rest of the analyses, I’ll run both the four population analysis as well as a six population analysis, splitting the intermediate and offshore into Gulf and Atlantic. I think this makes sense biologically and is justified given our hypotheses going into the analysis.



Isolation by distance:

Next, testing if there is isolation by distance at various scales. For all I’m using PCA-based genetic distance with 64 PCs (based on Shirk et al.), but these values are very similar to Plink and Euclidian values.

IDB across all individuals

Across all individuals, there is an IBD signal. But I think this is likely driven by underlying comparisons.


IBD: Four population analysis

IBD: Six population analysis

Split this out into each individual population across both regions. Within a population we generally see IBD, except for Atlantic Offshore. Inter-population comparisons have no signal.

Tt312 and 13Tt073 are divergent and causing that weird pattern in offshore atlantic. I’m not sure why. Maybe there’s an argument to drop them, but they don’t appear weird in other analyses, I don’t think. Need to double check missing data, etc for these individuals.


IBD: Atlantic and Gulf

IBD: Combining Atlantic and Gulf

IBD: Comparing only between populations


IBD conclusion

  • strong signal of IBD within both the four and six populations.
  • Between the Gulf and Atlantic, there is relatively minimal IBD.



Introgression and hybrids

The Intermediate individuals are possibly hybrids, given the results above they’re intermediate in genetic distance for nearly all analyses. Here, I’ll test this using:

  1. f3 statistics
  2. D-/f4 statistics
  3. treemix
  4. triangle plots
  5. new hybrids

f3 statistics:

The f3-statistic explicitly tests whether a taxon of interest results from admixture between two others: A significantly negative f3-statistic supports the admixture hypothesis, while a positive value is not informative. In our case, our taxon of interest (pop1) is Intermediate while pop1 and pop2 are Coastal and Offshore.

First, I calculated these statistics with the four populations assignments:

F3-Statistics with Four Populations
pop1 pop2 pop3 est se z p
Intermediate Coastal_Atl Offshore 0.0037947 0.0006491 5.8461604 0.0000000
Intermediate Coastal_Gulf Offshore -0.0005488 0.0005773 -0.9506458 0.3417842

Remember, positive values are not informative, negative values indicate a population resulting from admixture. No significance here.

Next, I split into 6 populations:

F3-Statistics with Six Populations
pop1 pop2 pop3 est se z p
Intermediate Atlantic
Intermediate_Atlantic Coastal_Atl Offshore_Atlantic 0.0057962 0.0007623 7.6034085 0.0000000
Intermediate_Atlantic Coastal_Atl Offshore_Gulf 0.0057595 0.0006787 8.4857907 0.0000000
Intermediate_Atlantic Coastal_Gulf Offshore_Atlantic 0.0006596 0.0006667 0.9893106 0.3225112
Intermediate_Atlantic Coastal_Gulf Offshore_Gulf 0.0020004 0.0005960 3.3563314 0.0007898
Intermediate Gulf
Intermediate_Gulf Coastal_Atl Offshore_Atlantic 0.0004012 0.0007493 0.5354332 0.5923504
Intermediate_Gulf Coastal_Atl Offshore_Gulf 0.0002826 0.0006526 0.4330949 0.6649458
Intermediate_Gulf Coastal_Gulf Offshore_Atlantic -0.0044799 0.0006952 -6.4443067 0.0000000
Intermediate_Gulf Coastal_Gulf Offshore_Gulf -0.0032210 0.0006094 -5.2856212 0.0000001

This indicates that the intermediate Gulf population is a result of admixture between the coastal gulf and both the Offshore Atlantic and Gulf populations. There is no evidence in the Atlantic Intermediate population.



D-/f4 statistics:

D-statistics, or ABBA-BABA tests, test for introgression by looking for deviations from incomplete lineage sorting. In short, if we have a tree with an ancestral “A” allele and derived “B” allele in the tree (((P1,P2),P3),O) where O is the outgroup, we should see an “ABBA” or “BABA” pattern at equal frequencies when there is incomplete lineage sorting and no gene flow. If there is an over representation of either ABBA or BABA, this suggests gene flow (see figure below, from the Dsuite tutorial).

Example of ABBA BABA test

I ran this test with Dsuite, with Aduncus as the outgroup. For the output below, P1 and P2 will always be arranged so that D is positive and indicates geneflow between P2 and P3. P1 and P2 could be flipped which would just flip the sign of D to negative and indicate gene flow between P1 and P3.

D-Statistics with four Populations
P1 P2 P3 Dstatistic Z.score p.value BBAA ABBA BABA p.value_multTesting
Coastal_Atl Coastal_Gulf Intermediate 0.1043800 8.80336 0.0000000 148.398 127.419 103.3330 0.0000000
Coastal_Atl Coastal_Gulf Offshore 0.0712124 5.25906 0.0000001 219.905 100.627 87.2480 0.0000006
Intermediate Coastal_Atl Offshore 0.0239574 1.42199 0.1550280 193.157 105.564 100.6240 0.6201120
Intermediate Coastal_Gulf Offshore 0.0930599 5.88660 0.0000000 205.884 107.584 89.2652 0.0000000
  • There is the strongest evidence for gene flow between the coastal gulf and intermediate population (line 1).
  • There is also gene flow between Coastal Gulf and Offshore
D-Statistics with six Populations
P1 P2 P3 Dstatistic Z.score p.value BBAA ABBA BABA p.value_multTesting
Coastal_Atl Coastal_Gulf Intermediate_Atlantic 0.1043000 8.8125300 0.0000000 146.849 127.6930 103.5720 0.0000000
Coastal_Atl Coastal_Gulf Intermediate_Gulf 0.1039980 8.3291000 0.0000000 154.582 126.1860 102.4120 0.0000000
Coastal_Atl Coastal_Gulf Offshore_Atlantic 0.0651194 4.4594000 0.0000082 229.247 96.8881 85.0410 0.0001644
Coastal_Atl Coastal_Gulf Offshore_Gulf 0.0761205 5.8891200 0.0000000 211.909 103.7900 89.1063 0.0000001
Intermediate_Atlantic Coastal_Atl Intermediate_Gulf 0.0034596 0.2326380 0.8160420 132.862 123.9690 123.1140 1.0000000
Intermediate_Atlantic Coastal_Atl Offshore_Atlantic 0.0198779 1.0956600 0.2732280 202.461 101.4630 97.5079 1.0000000
Intermediate_Atlantic Coastal_Atl Offshore_Gulf 0.0274589 1.6885100 0.0913126 187.978 108.4510 102.6550 1.0000000
Intermediate_Gulf Coastal_Atl Offshore_Atlantic 0.0199902 0.9907250 0.3218200 195.146 102.9920 98.9549 1.0000000
Intermediate_Gulf Coastal_Atl Offshore_Gulf 0.0273693 1.4770900 0.1396520 180.630 109.9980 104.1370 1.0000000
Offshore_Gulf Coastal_Atl Offshore_Atlantic 0.0418859 1.8357300 0.0663969 136.578 115.2220 105.9580 1.0000000
Intermediate_Atlantic Coastal_Gulf Intermediate_Gulf 0.1052520 8.0421600 0.0000000 138.552 129.3110 104.6830 0.0000000
Intermediate_Atlantic Coastal_Gulf Offshore_Atlantic 0.0830860 4.8449300 0.0000013 216.279 102.9970 87.1945 0.0000253
Intermediate_Atlantic Coastal_Gulf Offshore_Gulf 0.1016310 6.3494300 0.0000000 199.961 110.9970 90.5169 0.0000000
Intermediate_Gulf Coastal_Gulf Offshore_Atlantic 0.0823751 4.6976200 0.0000026 208.444 104.3550 88.4711 0.0000526
Intermediate_Gulf Coastal_Gulf Offshore_Gulf 0.1006260 6.2146000 0.0000000 192.076 112.3530 91.8088 0.0000000
Offshore_Gulf Coastal_Gulf Offshore_Atlantic 0.0973529 4.4513000 0.0000085 143.189 118.9830 97.8718 0.0001707
Intermediate_Gulf Intermediate_Atlantic Offshore_Atlantic 0.0004192 0.0332968 0.9734380 192.947 97.6398 97.5580 1.0000000
Intermediate_Gulf Intermediate_Atlantic Offshore_Gulf 0.0003095 0.0245152 0.9804420 178.944 103.3700 103.3060 1.0000000
Offshore_Gulf Intermediate_Atlantic Offshore_Atlantic 0.0241332 1.4828900 0.1381040 132.257 112.6530 107.3440 1.0000000
Offshore_Gulf Intermediate_Gulf Offshore_Atlantic 0.0242528 1.4013900 0.1610960 130.058 110.3830 105.1550 1.0000000
  • There is geneflow between both Intermediate Atlantic and Intermediate Gulf with Coastal Gulf.
  • Gene flow between Coastal gulf and both offshore populations
  • No signal for intermediate and offshore populations



Treemix

Maximum likelihood tree estimating drift among populations. Migration edges are fit to the tree to improve populations that are a poor fit to the model. Migration gets addes stepwise. You can estimate the number of migration events that improves the model fit best, similar to structure evanno type approaches.

no migration

First fit the trees with 0 migration events:

Four populations, no migration eventsSix populations, no migration events

Adding migration events:

Best number of migrations events:

  • Four populations: 2 migrations events
  • Six populations: 4 migration events

The four population result is consistent and clear. There are two migration edges, between intermediate and offshore and the node of intermediate/Coastal Gulf and offshore. This tree is well supported and consistent across runs (100 runs, nearly all show this exact tree, below).

In contrast, with 6 populations, things are much more uncertain/unstable. For the most likely tree (top left, in figure below) there are migration edges between the intermediate pops and the branch leading to coastal populations. There is also migration from the node of coastal Gulf/intermediate with both offshore populations. The next 5 most likely trees show similar variations on these migration events. Note that the position of the coastal and intermediate populations are unstable across runs. This maybe isn’t shocking given that these populations aren’t well supported in the other analyses.

Results that are consistent:

  • Offshore Gulf and Offshore Atl are always sister
  • There is no migration with the coastal Atlantic population.
  • there is ample migration between Intermediate and offshore populations.

Triangle plots

The basic idea behind these is that we can idenfity early generation hybrids by both their ancestry and (interclass) heterozygosity. We consider highly divergent differences (> 0.7 frequency; 0.8 snf 0.9 give similar results) between parental populations Ancestry Informative Markers (AIMs). Then an F1 hybrid would have a hybrid index based on these AIMs of 0.5 (50% of alleles from either parent population). We calculate how many of AIMs in the putative hybrids are heterozygous for ancestry from either parent. For an F1, all loci would be heterozygous, so this value would be 1. With F2, this heterozygosity would drop to ~0.5. and continue to drop if there is backcrossing.

Here’s a nice paper that shows expectations for different scenarios. In short, if we follow the expectation curve in the plot below, it is likely due to admixture and not isolation by distance or a similar process. In contrast, there should be no relationship between hybrid index and heterozygosity when admixture has not occurred and IBD is the main feature of the data https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14039

Four population assignment
Four population assignment



Six population assignment
Six population assignment



There are no F1 hybrids in these data, but the rest of the variation is likely due to admixture, not neutral IDB.







still to do

  1. Genomic cline analysis
  • using introgress, bgc, or similar (from Gompert).
  1. gene environment associations
  2. general selection scans within species.